A Pattern Extraction Workbench Combining Multiple Linguistic Levels
نویسندگان
چکیده
In this paper an interactive pattern extraction workbench, I*Pex, is presented. The workbench comes in a graphical environment and is designed to be used in an incremental and interactive fashion with the user. Patterns can be constructed to work in combination involving specifications on several linguistic levels simultaneously, from the character level using regular expressions, parts of speech and dependency relations to semantic roles. The input text format is based on XCES XML format.
منابع مشابه
Combinaison d'approches pour l'extraction automatique d'événements (Automatic events extraction by combining multiple approaches) [in French]
Automatic events extraction by combining multiple approaches In this paper, we present an automatic system for extracting events based on the combination of two existing information extraction approaches : the first one is made of hand-crafted linguistic rules and the second one is based on an automatic learning of linguistic patterns. We have shown that this mixed approach leads to a significa...
متن کاملA Linguistically Grounded Graph Model for Bilingual Lexicon Extraction
We present a new method, based on graph theory, for bilingual lexicon extraction without relying on resources with limited availability like parallel corpora. The graphs we use represent linguistic relations between words such as adjectival modification. We experiment with a number of ways of combining different linguistic relations and present a novel method, multi-edge extraction (MEE), that ...
متن کاملMedLex+: An Integrated Corpus-Lexicon Medical Workbench for Swedish
This paper reports on the work carried out developing MedLex+, a medical corpuslexicon workbench for Swedish. This project, which is still under active development, has been going on for some years now within the Department of Swedish language at Göteborg University. At the moment, the workbench incorporates: an annotated collection of medical texts-including 20 million tokens and 45,000 docume...
متن کاملAn Extensive Empirical Study of Collocation Extraction Methods
This paper presents a status quo of an ongoing research study of collocations – an essential linguistic phenomenon having a wide spectrum of applications in the field of natural language processing. The core of the work is an empirical evaluation of a comprehensive list of automatic collocation extraction methods using precision-recall measures and a proposal of a new approach integrating multi...
متن کاملA Linguistic Search Tool for Semitic Languages
The paper discusses searching a corpus for linguistic patterns. Semitic languages have complex morphology and ambiguous writing systems. We explore the properties of Semitic Languages that challenge linguistic search and describe how we used the Corpus Workbench (CWB) to enable linguistic searches in Hebrew corpora.
متن کامل